


On Simple Linear 
Regression 


İİ FLATIRON SCHOOL 


Why Linear 
Regression ? 





LRisafundamental tool in the 
data scientists kit. 


Practically speaking, using it is 
one or two lines of code. 


But its crucial for us to 
understand the theory 
underlying it: 
Building block for more 
complex tools 


We are better data scientists if 
we understand both HOW and 
WHY 





: Simple: functions of a single variable: Y = f(X) 
* Linear: models are lines 


· Regression: dependent variable is 
continuously-valued 


- As population density increases, so do housing prices. 
: As the number of trees decreases, the concentration of 


CO, goes up. 
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: Predictions for all values of the X variable 
* Model shape: 7 = 2)z + 80 
: Error as the distance between real and 


predicted values ^ “YY. 
"=W= p) 
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s Which of these lines fits the 
data best? 
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s Constructthe best-fit line for 
the points: 


(1, 2), (3, 9), and (5, 10). 


^ Remember: 


LS (Ti = T)(Yi m y) 


σ 
bı = тр 
σα 








Bo = 9 — В 
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^ Calculate x bar and y. bar: 


— — 249410 _ 
у= 2800 — 7 


Xi + [1,3,5] 
Жж: 12,9, 10] 


e Calculate these products: 


Xx, — (yi — y) = (1 — 3)02 - 7) + (3 — 3)(9 — 7) + (5 — 3)(10 — 7) = 16 
Σία; — x)” = (1 — 3)? + (3 — 3) + (5 — 3)? = 8 
Σίν, — у)? = (2 — 7)? + (9 — TY + (10 — 7)” = 38 
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^ Calculate Pearson correlation: 


2(x;—Xx)(y;—y) 16 
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Xi s [1,3,5] 
ж: 12,9, 10] 


e Calculate standard deviations: 


— 8 
Ox 3 
= 30 


Xi + [1,3,5] 
Жж: 12,9, 10] 


e Calculate the slope: 





s Calculate the y-intercept: 


by = Y- Вх = 7 — (2)(3) = 1 


s Wehaveourline! 


y = Pix + Bo 
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Surface and Contour Plots of SSE(m, b) 


Error as a function of slope and y-intercept 


Error as a function of slope and y-intercept 
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